Akira TAMAMORI Yoshihiko NANKAKU Keiichi TOKUDA
This paper proposes a new generative model which can deal with rotational data variations by extending Separable Lattice 2-D HMMs (SL2D-HMMs). In image recognition, geometrical variations such as size, location and rotation degrade the performance. Therefore, the appropriate normalization processes for such variations are required. SL2D-HMMs can perform an elastic matching in both horizontal and vertical directions; this makes it possible to model invariance to size and location. To deal with rotational variations, we introduce additional HMM states which represent the shifts of the state alignments among the observation lines in a particular direction. Face recognition experiments show that the proposed method improves the performance significantly for rotational variation data.
Yasuhisa FUJII Kazumasa YAMAMOTO Seiichi NAKAGAWA
In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a Multi-Layer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of features from which non-linear features can be extracted, and is trained by sequential criteria. We first present the formulation of HCNF and then examine three methods to further improve automatic speech recognition using HCNF, which is an objective function that explicitly considers training errors, provides a hierarchical tandem-style feature and includes a deep non-linear feature extractor for the observation function. We show that HCNF can be trained realistically without any initial model and outperforms HCRF and the triphone hidden Markov model trained by the minimum phone error (MPE) manner using experimental results for continuous English phoneme recognition on the TIMIT core test set and Japanese phoneme recognition on the IPA 100 test set.
Yuanqiang HUANG Zhongzhi LUAN Depei QIAN Zhigao DU Ting CHEN Yuebin BAI
With the consideration of real-time stream processing technology, it's important to develop high availability mechanism to guarantee stream-based application not interfered by faults caused by potential anomalies. In this paper, we present a novel online prediction technique for predicting some anomalies which may occur in the near future. Concretely, we first present a value prediction which combines the Hidden Markov Model and the Mixture of Expert Model to predict the values of feature metrics in the near future. Then we employ the Support Vector Machine to do anomaly identification, which is a procedure to identify the kind of anomaly that we are about to alarm. The purpose of our approach is to achieve a tradeoff between fault penalty and resource cost. The experiment results show that our approach is of high accuracy for common anomaly prediction and low runtime overhead.
Kazuhiro NAKAMURA Ryo SHIMAZAKI Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI
This paper presents a memory-efficient VLSI architecture for output probability computations (OPCs) of continuous hidden Markov models (HMMs) and likelihood score computations (LSCs). These computations are the most time consuming part of HMM-based isolated word recognition systems. We demonstrate multiple fast store-based block parallel processing (MultipleFastStoreBPP) for OPCs and LSCs and present a VLSI architecture that supports it. Compared with conventional fast store-based block parallel processing (FastStoreBPP) and stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and less processing time. The processing elements (PEs) used in the FastStoreBPP and StreamBPP architectures are identical to those used in the MultipleFastStoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows that the proposed architecture is an improvement over the others, through efficient use of PEs and registers for storing input feature vectors.
Muhammad Rasyid AQMAR Koichi SHINODA Sadaoki FURUI
Variations in walking speed have a strong impact on gait-based person identification. We propose a method that is robust against walking-speed variations. It is based on a combination of cubic higher-order local auto-correlation (CHLAC), gait silhouette-based principal component analysis (GSP), and a statistical framework using hidden Markov models (HMMs). The CHLAC features capture the within-phase spatio-temporal characteristics of each individual, the GSP features retain more shape/phase information for better gait sequence alignment, and the HMMs classify the ID of each gait even when walking speed changes nonlinearly. We compared the performance of our method with other conventional methods using five different databases, SOTON, USF-NIST, CMU-MoBo, TokyoTech A and TokyoTech B. The proposed method was equal to or better than the others when the speed did not change greatly, and it was significantly better when the speed varied across and within a gait sequence.
Yoshitoshi YAMASHITA Eiji OKAMOTO Yasunori IWANAMI Yozo SHOJI Morio TOYOSHIMA Yoshihisa TAKAYAMA
We propose a novel channel model of satellite-to-ground optical transmission to achieve a global-scale high-capacity communication network. In addition, we compose an effective channel coding scheme based on low-density generator matrix (LDGM) code suitable for that channel. Because the first successful optical satellite communication demonstrations are quite recent, no practical channel model has been introduced. We analyze the results of optical transmission experiments between ground station and the Optical Inter-orbit Communications Engineering Test Satellite (OICETS) performed by NICT and JAXA in 2008 and propose a new Markov-based practical channel model. Furthermore, using this model we design an effective long erasure code (LEC) based on LDGM to achieve high-quality wireless optical transmissions.
This paper deals with underwater target classification using synthesized active sonar signals. Firstly, we synthesized active sonar returns from a 3D highlight model of underwater targets using the ray tracing algorithm. Then, we applied a multiaspect target classification scheme based on a hidden Markov model to classify them. For feature extraction from the synthesized sonar signals, a matching pursuit algorithm was used. The experimental results depending on the number of observations and signal-to-noise ratios are presented with our discussions.
Hiroki NOGUCHI Kazuo MIURA Tsuyoshi FUJINAGA Takanobu SUGAHARA Hiroshi KAWAGUCHI Masahiko YOSHIMOTO
We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel Viterbi architecture, and pipeline operation between Viterbi transition and GMM processing. Results show that our architecture achieves 88.24% required frequency reduction (66.74 MHz) and 84.04% memory bandwidth reduction (549.91 MB/s) for real-time 60-k word continuous speech recognition.
Hao BAI Chang-zhen HU Gang ZHANG Xiao-chuan JING Ning LI
The letter proposes a novel binary vulnerability analyzer for executable programs that is based on the Hidden Markov Model. A vulnerability instruction library (VIL) is primarily constructed by collecting binary frames located by double precision analysis. Executable programs are then converted into structurized code sequences with the VIL. The code sequences are essentially context-sensitive, which can be modeled by Hidden Markov Model (HMM). Finally, the HMM based vulnerability analyzer is built to recognize potential vulnerabilities of executable programs. Experimental results show the proposed approach achieves lower false positive/negative rate than latest static analyzers.
Xiaoyu QIAO Zhenhui TAN Bo AI Jiaying SONG
The spectrum handoff problem for cognitive radio systems is considered in this paper. The secondary users (SUs) can only opportunistically access the spectrum holes, i.e. the frequency channels unoccupied by the primary users (PUs). As long as a PU appears, SUs have to vacate the channel to avoid interference to PUs and switch to another available channel. In this paper, a prediction-based spectrum handoff scheme is proposed to reduce the negative effect (both the interference to PUs and the service block of SUs) during the switching time. In the proposed scheme, a hidden Markov model is used to predict the occupancy of a frequency channel. By estimating the state of the model in the next time instant, we can predict whether the frequency channel will be occupied by PUs or not. As a cross-layer design, the spectrum sensing performance parameters false alarm probability and missing detection probability are taken into account to enhance accuracy of the channel occupancy prediction. The proposed scheme will react on the spectrum sensing algorithm parameters while the spectrum handoff performance is significantly affected by them. The interference to the PUs could be reduced obviously by adapting the proposed spectrum handoff scheme, associated with a potential increase of switch delay of SUs. It will also be helpful for SUs to save broadband scan time and prefer an appropriate objective channel so as to avoid service block. Numerical results demonstrate the above performance improvement by using this prediction-based scheme.
Signals received at the interrogator of an RFID system always suffer from various kinds of channel deformation factors, such as the path loss of the wireless channel, insufficient channel bandwidth resulted from the multipath propagation, and the carrier frequency offset between tags and interrogators. In this paper we proposed a novel Viterbi-based algorithm for joint detection of data sequence and compensation of distorted signal waveform. With the assumption that the transmission clock is exactly synchronized at the reader, the proposed algorithm takes advantage of the structured data-encoded waveform to represent the modulation scheme of the RFID system as a trellis diagram and then the Viterbi algorithm is applicable to perform data sequence estimation. Furthermore, to compensate the distorted symbol waveform, the proposed Jiggle-Viterbi algorithm generates two substates, each corresponding to a variant structure waveform with adjustable temporal support, so that the symbol waveform deformation can be compensated and therefore yield a significant better performance in terms of bit error rate. Computer simulations shows that even in the presence of a moderate carrier frequency offset, the proposed approach can work out with an acceptable accuracy on data sequence detection.
Statistical speech recognition using continuous-density hidden Markov models (CDHMMs) has yielded many practical applications. However, in general, mismatches between the training data and input data significantly degrade recognition accuracy. Various acoustic model adaptation techniques using a few input utterances have been employed to overcome this problem. In this article, we survey these adaptation techniques, including maximum a posteriori (MAP) estimation, maximum likelihood linear regression (MLLR), and eigenvoice. We also present a schematic view called the adaptation pyramid to illustrate how these methods relate to each other.
Lei LI Bin FU Christos FALOUTSOS
Quad-core cpus have been a common desktop configuration for today's office. The increasing number of processors on a single chip opens new opportunity for parallel computing. Our goal is to make use of the multi-core as well as multi-processor architectures to speed up large-scale data mining algorithms. In this paper, we present a general parallel learning framework, Cut-And-Stitch, for training hidden Markov chain models. Particularly, we propose two model-specific variants, CAS-LDS for learning linear dynamical systems (LDS) and CAS-HMM for learning hidden Markov models (HMM). Our main contribution is a novel method to handle the data dependencies due to the chain structure of hidden variables, so as to parallelize the EM-based parameter learning algorithm. We implement CAS-LDS and CAS-HMM using OpenMP on two supercomputers and a quad-core commercial desktop. The experimental results show that parallel algorithms using Cut-And-Stitch achieve comparable accuracy and almost linear speedups over the traditional serial version.
Kazuhiro NAKAMURA Masatoshi YAMAMOTO Kazuyoshi TAKAGI Naofumi TAKAGI
In this paper, a fast and memory-efficient VLSI architecture for output probability computations of continuous Hidden Markov Models (HMMs) is presented. These computations are the most time-consuming part of HMM-based recognition systems. High-speed VLSI architectures with small registers and low-power dissipation are required for the development of mobile embedded systems with capable human interfaces. We demonstrate store-based block parallel processing (StoreBPP) for output probability computations and present a VLSI architecture that supports it. When the number of HMM states is adequate for accurate recognition, compared with conventional stream-based block parallel processing (StreamBPP) architectures, the proposed architecture requires fewer registers and processing elements and less processing time. The processing elements used in the StreamBPP architecture are identical to those used in the StoreBPP architecture. From a VLSI architectural viewpoint, a comparison shows the efficiency of the proposed architecture through efficient use of registers for storing input feature vectors and intermediate results during computation.
In this paper, we propose a technique for estimating the degree or intensity of emotional expressions and speaking styles appearing in speech. The key idea is based on a style control technique for speech synthesis using a multiple regression hidden semi-Markov model (MRHSMM), and the proposed technique can be viewed as the inverse of the style control. In the proposed technique, the acoustic features of spectrum, power, fundamental frequency, and duration are simultaneously modeled using the MRHSMM. We derive an algorithm for estimating explanatory variables of the MRHSMM, each of which represents the degree or intensity of emotional expressions and speaking styles appearing in acoustic features of speech, based on a maximum likelihood criterion. We show experimental results to demonstrate the ability of the proposed technique using two types of speech data, simulated emotional speech and spontaneous speech with different speaking styles. It is found that the estimated values have correlation with human perception.
When the joint source-channel (JSC) decoder is used for source coding over noisy channels, the JSC decoder may invent errors even though the received data is not corrupted by the channel noise, if the JSC decoder assumes the channel was noisy. A novel encoder algorithm has been recently proposed to improve the performance of the communications system under this situation. In this letter, we propose another algorithm based on conditional entropy-constrained vector quantizer to further improve the encoder. The algorithm proposed in this letter significantly improves the performance of the communications system when the hypothesized channel bit error rate is high.
Mohammad Nurul HUDA Hiroaki KAWASHIMA Tsuneo NITTA
This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.
A novel method is proposed to track the position of MS in the mixed line-of-sight/non-line-of-sight (LOS/NLOS) conditions in cellular network. A first-order markov model is employed to describe the dynamic transition of LOS/NLOS conditions, which is hidden in the measurement data. This method firstly uses modified EKF banks to jointly estimate both mobile state (position and velocity) and the hidden sight state based on the the data collected by a single BS. A Bayesian data fusion algorithm is then applied to achieve a high estimation accuracy. Simulation results show that the location errors of the proposed method are all significantly smaller than that of the FCC requirement in different LOS/NLOS conditions. In addition, the method is robust in the parameter mismodeling test. Complexity experiments suggest that it supports real-time application. Moreover, this algorithm is flexible enough to support different types of measurement methods and asynchronous or synchronous observations data, which is especially suitable for the future cooperative location systems.
Yousuke NARUSE Jun-ichi TAKADA
We address the issue of MIMO channel estimation with the aid of a priori temporal correlation statistics of the channel as well as the spatial correlation. The temporal correlations are incorporated to the estimation scheme by assuming the Gauss-Markov channel model. Under the MMSE criteria, the Kalman filter performs an iterative optimal estimation. To take advantage of the enhanced estimation capability, we focus on the problem of channel estimation from a partial channel measurement in the MIMO antenna selection system. We discuss the optimal training sequence design, and also the optimal antenna subset selection for channel measurement based on the statistics. In a highly correlated channel, the estimation works even when the measurements from some antenna elements are omitted at each fading block.
Abdul JALIL Anwar MANZAR Tanweer A. CHEEMA Ijaz M. QURESHI
A rotation invariant texture analysis technique is proposed with a novel combination of Radon Transform (RT) and Hidden Markov Models (HMM). Features of any texture are extracted during RT which due to its inherent property captures all the directional properties of a certain texture. HMMs are used for classification purpose. One HMM is trained for each texture on its feature vector which preserves the rotational invariance of feature vector in a more compact and useful form. Once all the HMMs have been trained, testing is done by picking any of these textures at any arbitrary orientation. The best percentage of correct classification (PCC) is above 98 % carried out on sixty texture of Brodatz album.